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MOOD DISORDER GENE 



The invention is concerned with the determination 
of genetic factors associated with psychiatric health 
5 with particular reference to a human gene or genes 
which contributes to or is responsible for the 
manifestation of a mood disorder or a related disorder 
in affected individuals. In particular, although not 
exclusively, the invention provides a method of 
10 identifying and characterising such a gene or genes 
from human chromosome 18, as well as genes so 
identified and their expression products. The 
invention is also concerned with methods of 
determining the genetic susceptibility of an 
15 individual to a mood disorder or related disorder. By 
mood disorders or related disorders is meant the 
following disorders as defined in the Diagnostic and 
Statistical Manual of Mental Disorders, version 4 
(DSM-IV) taxonomy (DSM-IV codes in parenthesis):- mood 
20 disorders (296, XX, 300.4, 311, 301.13, 295.70), 
schizophrenia and related disorders (295. XX, 
297.1,298.8, 297.3, 298.9), anxiety disorders (300. XX, 
309.81,308.3), adjustment disorders (309. XX) and 
personality disorders (codes 301. XX) . 
25 The methods of the invention are particularly 

exemplified in relation to genetic factors associated 
with a family of mood disorders known as Bipolar (BP) 
spectrum disorders. 

Bipolar disorder (BP) is a severe psychiatric 
30 condition that is characterized by disturbances in 

mood, ranging from an extreme state of elation (mania) 
to a severe state of dysphoria (depression) . Two types 
of bipolar illness have been described: type I BP 
illness (BPI) is characterized by major depressive 
35 episodes alternated with phases of mania, and type II 
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BP illness (BPII) , characterized by major depressive 
episodes alternating with phases of hypomania. 
Relatives of BP probands have an increased risk for 
BP, unipolar disorder (patients only experiencing 
5 depressive episodes; UP}, cyclothymia (minor 

depression and hypomania episodes; CY) as well as for 
schizoaffective disorders of the manic (SAm) and 
depressive (SAd) type. Based on these observations BP, 
CY, UP and SA are classified as BP spectrum disorders. 

10 The involvement of genetic factors in the etiology of 
BP spectrum disorders was suggested by family, twin 
and adoption studies (Tsuang and Faraone (1990), The 
Genetics of Mood Disorders, Baltimore, The John 
Hopkins University Press). However, the exact pattern 

15 of transmission is unknown. In some studies, complex 
segregation analysis supports the existence of a 
single major locus for BP (Spence et al • (1995), Am J. 
Med. Genet (Neuropsych. Genet.) 60 pp 370-376). Other 
researchers propose a liability-threshold-model, in 

20 which the liability to develop the disorder results 

from the additive combination of multiple genetic and 
environmental effects (McGuffin et al . (1994), 
Affective Disorders; Seminars in Psychiatric Genetics 
Gaskell, London pp 110-127). 

2 5 Due to the complex mode of inheritance, 

parametric and nonparametric linkage strategies are 
applied in families in which BP disorder appears to be 
transmitted in a Mendelian fashion. Early linkage 
findings on chromosomes llplS (Egeland et al . (1937), 

30 Nature 325 pp 783-787) and Xq27-g28 (Mendlewicz et al . 
(1987) The Lancet 1 pp 1230 -1232; Baron et al . (1987) 
Nature 326 pp 289-292) have been controversial and 
could initially not be replicated (Kelsoe et al . 
(1989) Nature 242 pp 238-243; Baron et al . (1993) 

35 Nature Genet 3 pp 49-55) . With the development of a 
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h uman genetic map saturated with highly polymorphic 

m arkers and the continuous development of data 

analysis technics, numerous new linage searches 

were started. In several studies, evidence or 
were starrea to part icular regions 

5 suggestive evidence for linkage w> p 

a i-j ifi 21 and X was found 
on chromosomes 4, 12, 18, 21 ana 

(Blackwood et al . (1996) Nature Genetics 12 pp 427 

Taddock et al. (1994) Brit J. Psychiatry 1.4 PP 
3551358, Berrettini et al . (1994), Proc Natl Acad sex 
10 USA 91 PP 5918-5921, Straub et al . 

Genetics 8 PP 291-296 and Pekkannen et al. (1995) 
Genome Research 5 pp 105-115) . In order to est the 
validity of the reported linkage results, these 
findings have to be replicated in other, independent 

15 studies. 

Recently. linkage of bipolar disorder to the 
pericentric region on *ro.oso« 18 was reported 
(Berrettini et al . 19.4) . Also a ring chromosome 8 
with break-points and deleted regions at iepter-pU 

2 0 and 18q23-gter was reported in three unrelated 
patients with BP illness or related 
(craddock et al. 1994). The chromosome lap linkage 
was replicated by stine et al . (199S, » J »» oenet 
57 pp 1384-1394, Who also reported suggests ev a dence 

25 for a locus on isq21.2-qll.« in the same study. 

interestingly, Stine et al. observed a parent-»f- 
origin effect: the evidence of linkage was the 
strongest in the paternal pedigrees, in which the 
proband's father or one of the proband's fathers S1 bs 

to is affected, . 

in an independent replication study, the present 
inventors tested linkage with chromosome !8 markers in 



35 



inventus u " w< — * * - 

10 Belgian families with a bipolar proband. To 
localize causative genes the linkage analysis 
likelihood method was used in these families. 
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method studies within a family the segregation of a 
defined disease phenotype with that of polymorphic 
genetic markers distributed in the human genome. The 
likelihood ratio of observing cosegregation of the 
disease and a genetic marker under linkage versus no 
linkage is calculated and the log of this ratio or the 
log of the odds is the' LOD score statistic z. A LOD 
score of 3 (or likelihood ratio of 1000 or greater) is 
taken as significant statistical evidence for linkage, 
in the inventors' study no evidence for linkage to the 
pericentromeric regions was found, but in one of the 
families, MAD31, a Belgian family of a BPII proband, 
suggestive linkage was found with markers located at 
18q21.33-q23 (De bruyn et al. (1996) Biol Psychiatry 
[} 15 39 pp 679-688). Multipoint linkage analysis gave the 

jj] highest LOD score in the interval between STR (Short 

O Tandem Repeats) polymorphisms D18S51 and D18S61, with 

° a maximum multipoint LOD score of +1.34. Simulation 

L, studies indicated that this LOD score is within the 

k 20 range of what can be expected for a linked marker 

H; given the information available in the family. 

Jj Likewise, an affected sib-pair analysis also rejected 

Q the null-hypothesis of nonlinkage for several of the 

markers tested. Two other groups also found evidence 
25 for linkage of bipolar disorder to I8g (Freimer et al . 
(1996) Nature Genetics 12 pp 436-441, Coon et al. 
(1996) Biol Psychiatry 39 pp 689 to 696). Although 
the candidate regions in the different studies do not 
entirely overlap, they all suggest the presence of a 
30 susceptibility locus at I8q21-q23. 

The inventors have now carried out further 
investigations into the 18g chromosomal region in 
family MAD31. By analysis of cosegregation of bipolar 
disease in MAD31 with twelve STR polymorphic markers 
35 previously located between the aforementioned markers 
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D18S 51 and D18S61 and subsequent LCD score analysis as 

rte inventors have further refined 
described above, the inventors na 

the candidate region of chrome 18 xn whxch a gene 
associated with mood disorders such as bipolar 
spectrum disorders may be located and have constructed 
a physical map. The r.egion in question may thus be 
used to locate, isolate and sequence a gene or genes 
which influences psychiatric health 

The inventors have also constructed a YAC (yeast 
artificial chromosome) contig map of the candidate 
reg on to determine the relative order of the twelve 
Z marKers mapped by the congregational a = 
they have identified seven clones from the YAC library 
incorporating the candidate region. 

A number of procedures can be applied to the 
identified YAC clones and, where applicable to the 
DNA Of an individual afflicted with a mood disorder 
defined herein, in the process of identifying and 
characterising the relevant gene or genes. For 
example, the inventors have used YAC clones spann ng 
the region of interest in chromosome 18 to identify by 
CAG or CTG fragmentation novel genes that are 
allegedly involved in the manifestation of mood 
disorders or related disorders. 

Other procedures can also be applied to the said 
VAC clones to identify candidate genes as discussed 
below. 

Once candidate genes have been identified it « 
possible to assess the susceptibility of an indiv dual 
to a mood disorder or related disorder by detecting 
the presence of a polymorphism associated with a mood 
disorder or related disorder in such genes. 

Accordingly, in a first aspect the present 
invention comprises the use of an 8.9 cM region of 
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h M chromosome 18° disposed between P ^ « 
m ar*ers DISS" and D18S979 or . "^^"^.a 
identifying a, least one human gene .""J"*^ 
and polymorphic variant, thereof, which i 
s mood disorders or related disorders as defied 

abo ve. AS will be described below, the present 
inventors have identified this candidate region of 
chromosome Itq for such a gene, by 

Da8S 51 and DiSSSl and subsequent LOO score analyse. 

In a second aspect the invention comprises the 
US e of a VAC clone comprising a portion o< : human 

l5 chromosome 18g disposed between polymorph - »arKe- 
D18SS0 and D18S61 for identifying at least one human 
gene, including mutated or polymorphic variants 
thereof, which is associated with mood disorders or 
Elated disorders as defined above. D18SS0 is close 

20 to D18S51 so the particular VAC clones for use are 

those which have an artificial chromosome spanning the 
candidate region of human chromosome IS, between 
polymorphic markers D18S51 and C13S61 as **">^^ 
the present inventors in their earlier paper (De bruyn 

" '* Particular VACS covering the candidate region 
which may be used in accordance with the present 
invention are 961.h.9, 94a.O. 76*.t.». "l.c.7, 
,07 el, 75 3 -g-8 and 717 <U, preferred ones being ..1. 

30 h. 9 ,' «..«.» and 907.e.l since these have .e .,- 
tiling path across the candidate region. SU table 
clones for use are those having an artificial 
chromosome spanning the refined candidate region 
between D19S63 and D1SS979. 



35 
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There are a number of methods which can be 
applied to the candidate regions of chromosome I8q as 
defined above, whether or not present in a YAC, to 
identify a candidate gene or genes associated with 
mood disorders or related disorders. For example, it 
has previously been demonstrated that an apparent 
association exists between the presence of 
trinucleotide repeat expansions (TRE) in the human 
genome and the phenomenon of anticipation of mood 
disorders (Lindblad et al . (1995), Neurobiology of 
Disease 2: 55-62 and O'Donovan et al . (1995), Nature 
Genetics 10: 380-381) . 

Accordingly, in a third aspect the present 
invention comprises a method of identifying at least 
15 one human gene, including mutated and polymorphic 
variants thereof, which is associated with a mood 
disorder or related disorder as defined herein which 
comprises detecting nucleotide triplet repeats in the 
region of human chromosome 18q disposed between 
20 polymorphic markers D18S68 and D18S979- 

An alternative method of identifying said gene or 
genes comprises fragmenting a YAC clone comprising a 
portion of human chromosome 18q disposed between 

25 polymorphic markers D18S60 and D18S61, for example one 
or more of the seven aforementioned YAC clones, and 
detecting any nucleotide triplet repeats in said 
fragments. Nucleic acid probes comprising at least 5 
and preferably at least 10 CTG and/or CAG triplet 

30 repeats are a suitable means of detection when 

appropriately labelled. Trinucleotide repeats may 
also be determined using the known RED (repeat 
expansion detection) system (Shalling et al.(1993), 
Nature Genetics ± pp 135-139) . 

35 m a fourth embodiment the invention comprises a 
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method of identifying at least one gene, including 
mutated and polymorphic variants thereof, which is 
associated with a mood disorder or related disorder 
and which is present in a YAC clone spanning the 
region of human chromosome 18q between polymorphic 
markers D18S60 and D18S61, the method comprising the 
step of detecting the expression product of a gene 
incorporating nucleotide triplet repeats by use of an 
antibody capable of recognising a protein with an 
amino acid sequence comprising a string of at least 8, 
but preferably at least 12, continuous glutamine 
residues. Such a method may be implemented by 
subcloning YAC DNA, for example from the seven 
aforementioned YAC clones, into a human DNA expression 
library. A preferred means of detecting the relevant 
expression product is by use of a monoclonal antibody, 
in particular mAB 1C2 , the preparation and properties 
of which are described in International Patent 
Application Publication No wo 97/17445. 



As will be described in detail below, in order to 
identify candidate genes containing triplet repeats, 
the inventors have carried out direct CAG or CTG 
fragmentation of YACs 961.h.<>, 766.f.l2 and 907.e.l, 
comprising a portion of human chromosome 18q disposed 
between polymorphic markers D18S60 and D18S61, and 
have identified a number of sequences containing CAG 
or CTG repeats, whose abnormal expansion may be 
involved in genetic susceptibility to a mood disorder 

30 or related disorder. 

Accordingly, in a fifth aspect, the invention 
provides a nucleic acid comprising the sequence of 
nucleotides shown in any one of Figures 15a, 16a, 17a, 
or 13a. 



35 
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In a further aspect, the invention provides a 
protein comprising an amino acid sequence encoded by 
the sequence of nucleotides shown in any one of 
Figures 15a, 16a , 17a, or 18a* 

In yet a further aspect the invention provides a 
mutated nucleic acid comprising a sequence of 
nucleotides which differ from the sequence of 
nucleotides shown in any one of Figures 15a, 16a, 17a, 
or 18a only in the extent of trinucleotide repeats. 

Also provided by the invention is a mutated protein 
comprising an amino acid sequence encoded by a 
sequence of nucleotides which differ from the sequence 
of nucleotides shown in any one of Figures 15a, 16a, 
17a, or 18a only in the extent of trinucleotide 
repeats. 

It is to be understood that the invention also 
contemplates nucleotide sequences having. at least 75% 
and preferably at least 80% homology with any of the 
sequences described above and having functional 
identity with any of said sequences. The homology is 
calculated as described by Altschul et al . (1997) 
Nucleic Acids Res. 25: 3389-3402, Karlin et al . (1990) 
Proc Natl Acad Sci USA 87: 2264-68 and Karlin et al . 
(1993) Proc Natl Acad Sci USA 90: 5873-5877, Also 
contemplated are amino acid sequences which differ 
from the above described sequences only in 
conservative amino acid changes. Suitable changes are 
well known to those skilled in the art. 

Knowledge of the sequences described above can be 
used to design assays to determine the genetic 
susceptibility of an individual to a mood disorder or 
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related disorder. 

Accordingly, in a further aspect the invention 
provides a method for determining the susceptibility 
of an individual to a mood disorder or related 
5 disorder which comprises the steps of: 

a) obtaining a DNA sample from said 
individual; 

10 b) providing primers suitable for the 

amplification of a nucleotide sequence comprised in 
the sequence shown in any one of Figures 15a, 16a, 17a 
or 18a said primers flanking the trinucleotide repeats 
comprised in said sequence; 



c) applying said primers to the said DNA 
sample and carrying out an amplification reaction; 

d) carrying out the same amplification 
reaction on a DNA sample from a control individual; 
and 



e) comparing the results of the 
amplification reaction for the said individual and for 
25 the said control individual; 

wherein the presence of an amplified fragment 
from said individual which is bigger in size from that 
of said control individual is an indication of the 

30 presence of a susceptibility to a mood disorder or 
related disorder of said individual. 
By control individual is meant an individual who is 
not affected by a mood disorder or related disorder 
and does not have a family history of mood disorders 

35 or related disorders. 
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Preferable primers to use in this method are those 
shown in Figure 15b, 16b, 17b or 18b but other 
suitable primers may be utilised. 

5 In a further aspect the invention provides a 

method of determining the susceptibility of an 
individual to a mood disorder or related disorder 
which method comprises the steps of : 

10 a) obtaining a protein sample from said 

individual; and 

b) detecting the presence of a protein 
comprising an amino acid sequence encoded by a 
m 15 sequence of nucleotides which differ from the sequence 

h% of nucleotides shown in any one of Figures 15a, 16a, 

Ul 17a, or 18a only in the extent of trinucleotide 

J:J repeats 

20 wherein the presence of said protein is an 

indication of the presence of a susceptibility to a 
" mood disorder or related disorder of said individual. 

% Preferably, the foresaid protein is detected by 

W\ utilising an antibody that is capable of recognising a 

25 string of at least 8 continuous glutamines as, for 
example, the mAB 1C2 antibody. 

The nucleic acids molecules according to the 
invention may be advantageously included in an 

30 expression vector, which may be introduced into a host 
cell of prokaryotic or eukaryotic origin. Suitable 
expression vectors include plasmids, which may be used 
to express foreign DNA in bacterial or eukaryotic host 
cells, viral vectors, yeast artificial chromosomes or 

35 mammalian artificial chromosomes. The vector may be 
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transfected or transformed into host cells using 
suitable methods known in the art such as, for 
example, electroporation, microinjection, infection, 
lipoinfection and direct uptake. Such methods are 
5 described in more detail, for example, by Sambrook et 
al. # "Molecular Cloning: A Laboratory Manual", 2nd ed. 
(1989) and by Ausbel et al . "Current Protocols in 
Molecular Biology", (1994). 

10 Also provided by the invention is a host cell, 

tissue or organism comprising the expression vector 
according to the invention. The invention further 
" provides a transgenic host cell, tissue or organism 

yi comprising a transgene capable of encoding the 

15 proteins of the invention, which may comprise a 

genomic DNA or a cDNA. The transgene may be present in 
the trangenic host cell, tissue or organism either 
stably integrated into the genome or in an extra 
s chromosomal state, 

A nucleic acid molecule comprising a nucleotide 
4* sequence shown in any one of Figures 15a, 16a, 17a or 

18a as well as the protein encoded by it may be 
therapeutically used in the treatment of mood 
25 disorders or related disorders in patients which 

present a trinucleotide repeat expansion (TRE) in at 
least one of the foresaid sequences. 

Accordingly, in another of its aspects the 
invention provides the above described nucleic acid 
30 molecules and proteins for use as medicaments for the 
treatment of individuals with a mood disorder or 
related disorder. Preferably, the nucleic acid or the 
protein is present in an appropriate carrier or 
delivery vehicle. As an example, the nucleic acid 
35 inserted into a vector, for example a plasmid or a 
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viral vector, may be transfected into a mammalian cell 
such as a somatic cell or a mammalian germ line cell, 
as described above. The cell to be transfected can be 
present in a biological sample obtained from the 
5 patient, for example blood or bone marrow, or can be 
obtained from cell culture. After transfection the 
sample may be returned or readministered to a patient 
according to methods known to those practised in the 
art, for example, methods as described in Kasid et 
10 al., Proc. Natl. Acad. Sci. USA (1990) 87:473; 
^ Rosenberg et al. (1990) New Eng. J. Med. 323: 570 ; 

S Williams et al , (1994) Nature 310: 476; Dick et al . 

U! (1985) Cell 42:71; Keller et al . (1985) Nature 318: 

Jjf 149 and Anderson et al , (1994) US Patent N. 5,399,346. 

Th 15 There are a number of viral vectors known to 

O those skilled in the art which can be used to 

'^i introduce the nucleic acid into mammalian cells, for 

^ example retroviruses , parvoviruses , coronaviruses , 

M= negative strand RNA viruses such as picornaviruses or 

^ 20 alphaviruses and double stranded DNA viruses including 

J- 

~I adenoviruses, herpesviruses such as Herpes Simplex 

p virus types 1 and 2, Epstein-Barr virus or 

cytomegalovirus and poxviruses such as vaccinia 
fowlpox or canarypox. Other viruses include, for 
25 example, Norwalk viruses, togaviruses, f laviviruses, 
reoviruses, papovaviruses, hepadnaviruses and 
hepatitis viruses. 

A preferred method to introduce nucleic acid that 
encodes the desired protein into cells is through the 
30 use of engineered viral vectors . These vectors 
provide a means to introduce nucleic acids into 
cycling and quiescent cells and have been modified to 
reduce cytotoxicity and to improve genetic stability. 
The preparation and use of engineered Herpes simplex 
35 virus type 1 (D.M. Krisky, et al . (1997) Gene Therapy 
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4(10): 1120-1125), adenoviral (A. Amalfitanl, et 
al. (1998) Journal of Virology 72 (2) : 926-933) , 
attenuated lentiviral (R. 2ufferey, et al . , Nature 
Biotechnology (1997) IS (9) 871-875) and 
5 adenoviral/retroviral chimeric (M. Feng, et al, Nature 
Biotechnology (1997) 15 (9) :866-870) vectors are known 
to the skilled artisan. 

The protein may be administered using methods 
known in the art. For example, the mode of 
10 administration is preferably at the location of the 

target cells. The administration can be by injection. 
Other modes of administration (parenteral, mucosal, 
systemic, implant, intraperitoneal, etc.) are 
generally known in the art. The agents can, 
yi 15 preferably, be administered in a pharmaceutical^ 

O acceptable carrier, such as saline, sterile water, 

f * 

a * Ringer's solution and isotonic sodium chloride 

LL solution. 



20 In yet another of its aspects the invention 

provides assay methods for identifying compounds that 
are able to enhance or inhibit the expression of the 
proteins of the invention. These assays can be 
conducted, for example, by transfecting a nucleic acid 

25 of the invention into host cells and then comparing 

the levels of mRNA transcript or the levels of protein 
expressed from said nucleic acids in the presence or 
absence of the compound. 

Different methods, well known to those skilled in the 
30 art can be employed in order to measure transcription 
or expression levels , 

Alternatively, it is possible to identify compounds 
that modulate transcription by using a reporter gene 
assay of the type well known in the art. In such an 
35 assay a reporter plasmid is constructed in which the 
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promoter of a gene, whose levels of transcription are 
to be monitored, is positioned upstream of a gene 
capable of expressing a reporter molecule. The 
reporter molecule is a molecule whose level of 
5 expression can be easily detected and may be either 

the transcript of the reporter gene or a protein with 
characteristics that allow it to be detected. For 
example, the molecule may be a fluorescent protein 
such as green fluorescent protein (GFP) . 
10 Compound assays may be conducted by introducing 

the reporter plasmid described above into an 
appropriate host cell and then measuring the amount of 
reporter molecule expressed in the presence or absence 
of the compound to be tested. 



The invention also relates to compounds 
identified by the above mentioned methods. 

Further embodiments of the present invention 

20 relate to methods of identifying the relevant gene or 
genes which involve the sub-cloning of YAC DNA as 
defined above into vectors such as BAC (bacterial 
artificial chromosome) or PAC (PI or phage artificial 
chromosome) or cosmid vectors such as exon-trap cosmid 

25 vectors. The starting point for such methods is the 
construction of a contig map of the region of human 
chromosome 18q between polymorphic markers D18S60 and 
D18S61. To this end the present inventors have 
sequenced the end regions of the fragment of human DNA 

30 in each of the seven aforementioned YAC clones and 
these sequences are disclosed herein. Following 
subcloning of YAC DNA into other vectors as described 
above, probes comprising these end sequences or 
portions thereof, in particular those sequences shown 

35 in Figures 1 to 11 herein, together with any known 
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sequenced tagged site (STS) in this region, as 
described in the VAC clone contig shown herein, as can 
be used to detect overlaps between said subclones and 
a contig map can be constructed. Also the known 
sequences in the current YAC contig can be used for 
the generation of contig map subclones. 

one route by which a gene or genes which is 
associated with a mood disorder or associated disorder 
can be identified is by use of the known technique of 

exon trapping. 

This is an artificial RNA splicing assay, most 
often making use in current protocols of a specialized 
exon-trap cosmid vector. The vector contains an 
artificial minigene consisting of a segment of the 
SV40 genome containing an origin of replication and a 
powerful promoter sequence, two splicing-competent 
exons separated by an intron which contains a multiple 
cloning site and an SV40 polyadenylation site. 

The YAC DNA is subcloned in the exon-trap vector 
and the recombinant DNA is transfected into a strain 
of mammalian cells. Transcription from the SV40 
promoter results in an RNA transcript which normally 
splices to include the two exons of the minigene. If 
the cloned DNA itself contains a functional exon, it 
can be spliced to the exons present in the vector's 
minigene. Using reverse transcriptase a cDNA copy can 
be made and using specific PCR primers, splicing 
events involving exons of the insert DNA can be 
identified. Such a procedure can identify coding 
regions in the YAC DNA which can be compared to the 
equivalent regions of DNA from a person afflicted with 
a mood disorder or related disorder to identify the 

relevant gene. 

Accordingly, in a further aspect the invention 
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comprises a method of identifying at least one human 
gene, including mutated variants and polymorphisms 
thereof, which is associated with a mood disorder or 
related disorder which comprises the steps of: 

5 

(a) transfecting mammalian cells with exon trap 
cosmid vectors prepared and mapped as described above; 

(b) culturing said mammalian cells in an 
10 appropriate medium; 

(c) isolating RNA transcripts expressed from the 
SV40 promoter; 

XS (d) preparing cDNA from said RNA transcripts; 

(e) identifying splicing events involving exons 
of the DNA subcloned into said exon trap cosmid 
vectors to elucidate positions of coding regions in 

20 said subcloned DNA; 

(f) detecting differences between said coding 
regions and equivalent regions in the DNA of an 
individual afflicted with said mood disorder or 

25 related disorder; and 

(g) identifying said gene or mutated or 
polymorphic variant thereof which is associated with 
said mood disorder or related disorders, 

30 

As an alternative to exon trapping the YAC DNA 
may be subcloned into BAC, PAC, cosmid or other 
vectors and a contig map constructed as described 
above. There are a variety of known methods available 
35 by which the position of relevant genes on the 
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subcloned DNA can be established as follows: 

(a) cDNA selection or capture (also called direct 
selection and cDNA selection) : this method involves 

5 the forming of genomic DNA/cDNA heteroduplexes by 
hybridizing a cloned DNA (e.g. an insert of a YAC 
DNA) , to a complex mixture of cDNAs, such as the 
inserts of all cDNA clones from a specific (e.g. 
brain) cDNA library. Related sequences will hybridize 
10 and can be enriched in subsequent steps using biotin- 
streptavidine capturing and PCR (or related 
techniques) ; 

(b) hybridization to mRNA/cDNA: a genomic clone 
15 (e.g. the insert of a specific cosmid) can be 

hybridized to a Northern blot of mRNA from a panel of 
culture cell lines or against appropriate (e.g. brain) 
cDNA libraries. A positive signal can indicate the 
presence of a gene within the cloned fragment; 

20 

(c) CpG island identification: CpG or HTF islands 
are short (about 1 kb) hypomethylated GC-rich (> 60%) 
sequences which are often found at the 5' ends of 
genes. CpG islands often have restriction sites for 

25 several rare-cutter restriction enzymes. Clustering 
of rare-cutter restriction sites is indicative of a 
CpG island and therefore of a possible gene. CpG 
islands can be detected by hybridization of a DNA 
clone to Southern blots of genomic DNA digested with 

30 rare-cutting enzymes, or by island-rescue PCR 

(isolation of CpG islands from YACs by amplifying 
sequences between islands and neighbouring Alu- 
repeats) ; 

35 (d) zoo-blotting: hybridizing a DNA clone (e.g. 
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the insert of a specific cosmid) at reduced stringency 
against a southern blot of genomic DNA samples from a 
variety of animal species. Detection of hybridization 
signals can suggest conserved sequences, indicating a 
5 possible gene. 

Accordingly, in a further aspect the invention 
comprises a method of identifying at least one human 
gene including mutated and polymorphic variants 
10 thereof which is associated with a mood disorder or 
related disorder which comprises the steps of: 

(a) subcloning the YAC DNA as described above 
into a cosmid, BAC, PAC or other vector; 

15 

(b) using the nucleotide sequences shown in any 
one of Figures 1 to 11 or any other sequenced tagged 
site (STS) in this region as in the YAC clone contig 
described herein, or part thereof consisting of not 

20 less than 14 contiguous bases or the complement 

thereof, to detect overlaps amongst the subclones and 
construct a map thereof; 

(c) identifying the position of genes within the 
25 subcloned DNA by one or more of CpG island 

identification, zoo-blotting, hybridization of the 
subcloned DNA to a cDNA library or a Northern blot of 
mRNA from a panel of culture cell lines; 

30 (d) detecting differences between said genes and 

equivalent region of the DNA of an individual 
afflicted with a mood disorder or related disorder; 
and 

35 (e) identifying said gene which is associated 
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with said mood disorders or related disorders. 

If the cloned YAC DNA is sequenced, computer 
analysis can be used to establish the presence of 
5 relevant genes. Techniques such as homology searching 
and exon prediction may be applied. 

Once a candidate gene has been isolated in 
accordance with the methods of the invention more 
detailed comparisons may be made between the gene from 

10 a normal individual and one afflicted with a mood 
disorder such as a bipolar spectrum disorder. For 
example, there are two methods, described as "mutation 
testing", by which a mutation or polymorphism in a DNA 
sequence can be identified. In the first the DNA 

15 sample may be tested for the presence or absence of 
one specific mutation but this requires knowledge of 
what the mutation might be. In the second a sample of 
DNA is screened for any deviation from a control 
(normal) DNA. This latter method is more useful for 

20 identifying candidate genes where a mutation is not 
identified in advance ♦ 

In addition, the following techniques may be 
further applied to a gene identified by the above- 
25 described methods to identify differences between 
genes from normal or healthy individuals and those 
afflicted with a mood disorder or related disorder: 

(a) Southern blotting techniques: a clone is 
30 hybridized to nylon membranes containing genomic DNA 
digested with different restriction enzymes of 
patients and healthy individuals. Large differences 
between patients and healthy individuals can be 
visualized using a radioactive labelling protocol; 

35 
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(b) heteroduplex mobility in polyacrylamide gels: 
this technique is based on the fact that the mobility 
of heteroduplexes in non-denaturing polyacrylamide 
gels is less than the mobility of homoduplexes. It 
is most effective for fragments under 2 00 bp; 

(c) single-strand conformational polymorphism 
analysis (SSCP or SSCA) : single stranded DNA folds up 
to form complex structures that are stabilized by weak 
intramolecular bonds- The electrophoretic mobilities 
of these structures on non-denaturing polyacrylamide 
gels depends on their chain lengths and on their 
conformation; 

(d) chemical cleavage of mismatches (CCM): a 
radiolabeled probe is hybridized to the test DNA, and 
mismatches detected by a series of chemical reactions 
that cleave one strand of the DNA at the site of the 
mismatch. This is a very sensitive method and can be 
.applied to kilobase-length samples; 

(e) enzymatic cleavage of mismatches: the assay 
is similar to CCM, but the cleavage is performed by 
certain bacteriophage enzymes- 

(f) denaturing gradient gel electrophoresis: in 
this technique, DNA duplexes are forced to migrate 
through an electrophoretic gel in which there is a 
gradient of increasing amounts of a denaturant 
(chemical or temperature) . Migration continues until 
the DNA duplexes reach a position on the gel wherein 
the strands melt and separate, after which the 
denatured DNA does not migrate much further. A single 
base pair difference between a normal and a mutant DNA 
duplex is sufficient to cause them to migrate to 
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different positions in the gel; 
(g) direct DNA sequencing. 

5 it will be appreciated that with respect to the 

methods described herein, in the step of detecting 
differences between coding regions from the YAC and 
the DNA of an individual afflicted with a mood 
disorder or related disorder, the said individual may 
10 be anybody with the disorder and not necessary a 
member of family MAD31. 

Q 

jjf In accordance with further aspects the present 

OJ invention provides an isolated human gene and variants 

& 15 thereof associated with a mood disorder or related 

K disorder and which is obtainable by any of the above 

Q described methods, an isolated human protein encoded 

by said gene and a cDNA encoding said protein. 



25 



In the experimental report which follows 
reference will be made to the following figures: 

FIGURE 1 shows V sequence of nucleotides which is 
the left arm end-seqUence of YAC 766. f. 12; 

FIGURE 2 shows alsequence of nucleotides which is 
a right arm end-sequentee of YAC 766. f. 12; 

FIGURE 3 shows a sequence of nucleotides which is 
30 a left arm end-sequence\ of YAC 717.d.3; 

FIGURE 4 shows a sequence of nucleotides which is 
a right arm end-sequence\ of YAC 7l7.d.3; 

35 FIGURE 5 shows a sequence of nucleotides which is 
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a right arm\end-sequence of YAC 7 3 i.e. 7; 

FIGURE fl shows a sequence of nucleotides which is 
a left arm enti-sequence of YAC 752.g.8; 

FIGURE 7 fchows a sequence of nucleotides which is 
a left arm end-sequence of YAC 942.C.3; 

FIGURE 8 sHpws a sequence of nucleotides which is 
a right arm end-Sequence of YAC 942. c.3; 

FIGURE 9 sho\s a sequence of nucleotides which is 
a left arm end-seAience of YAC 961. h_9; 



il 15 FIGURE 10 showfe a sequence of nucleotides which 

U! is a r ight arm end-4equence of YAC 961. h.9; 

T FIGURE 11 shows \a sequence of nucleotides which 

M= is a left arm end-seqUence of YAC 907.e.l; 



FIGURE 12 shows a pedigree of family MAD31; 



FIGURE 13 shows the haplotype analysis for family 
MAD13. Affected individuals are represented by filled 
25 diamonds, open diamonds represent individuals who were 
asymptomatic at the last psychiatric evaluation. Dark 
gray bars represent markers for which it cannot be 
deduced if they are recombinant; and 

30 FIGURE 14 shows the YAC contig map of the region 

of human chromosome 13 between the polymorphic markers 
D18560 and D18561. Black lines represent positive 
hits. YACs are not drawn to scale. 

35 FIGURE 15 shows ta) a CAG repeat (in bold) and 
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surrounding liucleotide sequence isolated from YAC 
961_h_9. The\sequence in italics is derived from End 
Rescue of the\ fragmented YAC. (b) PCR primers that can 
be used to determine the extent of trinucleotide 
repeats in the\ sequence. 

FIGURE 16\shows (a) a CAG repeat (in bold) and 
surrounding nucleotide sequence isolated from YAC 
766_f_12. The slquence in italics is derived from End 
Rescue of the fragmented YAC. (b) PCR primers that can 
be used to determine the extent of trinucleotide 
repeats in the sequence. 

FIGURE 17 shbws (a) a CAG repeat (in bold) and 
surrounding nucleotide sequence isolated from YAC 
766 _f_ 12 . The sequence in italics is derived from End 
Rescue of the fragmented YAC. (b) PCR primers that can 
be used to determine the extent of trinucleotide 
repeats in the sequence. 

FIGURE 18 showte (a) a CTG repeat (in bold) and 
surrounding nucleotide sequence isolated from YAC 
907_ e _l. The sequence in italics is derived from End 
Rescue of the fragmented YAC. (b) PCR primers that can 
be used to determine! the extent of trinucleotide 
repeats in the sequence. 



30 



Ex perimental.,! 

(a) Family Data 



Clinical diagnoses in MAD31, a Belgian family with a 
BPII proband were described in detail in De bruyn et 
35 al 1996. In that study only the 15 family members who 
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were informative for linkage analysis were selected 
for additional genotyping. The different clinical 
diagnoses in the family were as follows: 
1 BPI, 2 BPII, 2 UP, 4 Major depressive disorder (MDD) , 
1 SAm and 1 SAd. 

The pedigree of the MAD31 family is shown in 
Figure 12- 

(b) Genotvpincr of Fam ily Members 



All short tandem repeat (STR) genetic markers are di- 
or tetranucleotide repeat polymorphisms. Information 
concerning the genetic markers used in this study was 
obtained from several sources on the internet: Genome 
15 DataBase (GOB, http://gdbwww.gdb.org/), GenBank 

(http://www.ncbi.nlm.nih.gov/), Cooperative Human 
Linkage Center (CHLC, http://www.chlc.org/), Eccles 
Institute of Human Genetics (EIHG, 
http://www.genetics.utah.edu/) and Genethon 
20 (http://www.genethon.fr/). Standard PCR was performed 
in a 25 nl volume containing 100 ng genomic DNA, 200 
mM of each dNTP, 1.25 nK MgCl 2 , 30 prool of each 
primer and 0.2 units Goldstar DNA polymerase 
(Eurogentec) . One primer was end-labelled before PCR 
25 with [gamma- 32 P]ATP and T4 polynucleotide kinase. After 
an initial denaturation step at 94 "C for 2 min, 27 
cycles were performed at 94 *C for 1 min, at the 
appropriate annealing temperature for 1.5 min and 
extension at 72 *C for 2 min. Finally, an additional 
30 elongation step was performed at 72 'C for 5 min. PCR 
products were detected by electrophoresis on a 6% 
denaturing polyacrylamide gel and by exposure to an X- 
ray sensitive film. Successfully analysed STSs, STRs 
and ESTs covering the refined candidate region are 
35 fully described herein on pages 36 to 54. 
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(c) T.nd scor " analysis.. 

Two-point lod scores were calculated for 3 
different disease models using Fastlinlc 2.2. 
(Cottingham et al. 1993). For all models, a 
gene frequency of 1% and a phenocopy rate of 1/1000 
was used. Model 1 included all patients and unaffected 
individuals with the latter individuals being assigned 
to a disease penetrance class depending on their age 
at examination. The 9 age-dependent penetrance classes 
as described by De bruyn et al (1996) were multiplied 
by a factor 0.7 corresponding to a reduction of the 
maximal penetrance of 99% to 70% for individuals older 
than 60 years (Ott 1991). Model 2 is similar to model 
1 but patients were assigned a diagnostic stability 
score calculated based on clinical data such as the 
number of episodes, the number of symptoms during the 
worst episode and history of treatment (Rice et al 

*i iQ9f^ Model 3 is as model 1 our 
1987, De bruyn et al. 1996)- «o aH1 

includes only patients. 

(d ) rv^mietion - f * h » VAC cnntin ~ P rotocols - 

Growing of YACs and extraction of YAC DNA was 
done according to standard protocols (Silverman, 
1995) . For the construction of the YAC-contig spanning 
the chromosome ISq candidate region, the data of the 
physical map based on sequence tagged sites (STSs) 
(Hudson et al. 1995) was consulted on the Whitehead 
Institute (WI) internet site (http://www- 
genome.wi.mit.edu/). CEPH mega-YACs were obtained from 
the YAC Screening Centre Leiden (YSCL, the 
Netherlands) and from CEPH (Paris, France). The YACs 
were analyzed for the presence of STSs and STRs, 
previously located between D18SS1 and D18S61, by 
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touchdown PCR amplification. Information on the 
STSs/STRs was obtained from the HI, GDB, Genethon, 
CHLC and GenBank sites on the Internet. Thirty PCR 
cycles consisted of: denaturation at 94'c for i mm, 
annealing (2 cycles for each temperature) starting 
from 65 "C and decreasing to 51' c for 1.5 mm and 
extension at 72 'C for 2 min. This was followed by 10 
cycles of denaturation at 94 "C for l min, annealing at 
50'C for 1.5 min and extension at 72*C for 2 min. A 
final extension step was performed for 10 min at 72 'c. 
Amplified products were visualised by electrophoresis 
on a 1% TBE agarose gel and ethidium bromide staining. 

(e) r>rriP.rina o * the STR markers.. 



Twelve STR markers, previously located between 
D18S51 and D18S61, were tested for cosegregation with 
bipolar disease in family MAD31. The parental 
haplotypes were reconstructed from genotype 
information of the siblings in family MAD31 and 
minimalizing the number of possible recombinants. The 
result of this analysis is shown in Figure 13. The 
father was not informative for 3 markers, the mother 
was not informative for 5 markers. Haplotypes in 
25 family MAD 31 suggested the following order for the 
. STR markers analysed: cen-[S51-S68-S346]-[S55-S969- 
S1113-S483-S465]-[S876-S477]-S979-(S466-S817-S61]-tel. 

The order relative to each other of the markers 
between brackets could not be inferred from our 
30 haplotype data. The marker order in family MAD31 was 
compared with the marker order obtained using 
different mapping techniques and the results shown m 
Table 1 below. 



35 
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Table 1. Comparison of the order of the markers within the 1 8q candidate region for bipolar 



disorder, among several maps. 



5 


Marker* 




Genetic maps 


Radiation hybrid map 






Genethon 


Marshfield 


(Giacalocie et al. 1996) 




D18S51 




(-)3.4cM 


(-)27.9 cR 


10 


D18S68 


0 cM 


OcM 


U CK 




D18S346 




5.3 cM 


52.2 cR 




D18S55 


0.1 cM 


OcM 


li.J CK 


15 


D18S969 
D18S1113 


0.7 cM 


0.6 cM 






D18S483 


2.5 cM 


3.2 cM 


88 cR 


20 


D18S465 


4.5 cM 


5.3 cM 


101.3 cR 




D18S876 










D18S477 


4.4 cM 


5.3 cM 


166.4 cR 


25 


D18S979 




8.9 cM 






D18S466 


7.6 cM 


ll.l cM 


212.4 cR 




D18S61 


8.4 cM 


11.8cM 


249.5 cR 




D18S817 




5.3 cM 


260.6 cR . 



30 



* Order according to haplotyping results in family MAD3I. 
(■) Marker is located proximal of D18S68. 
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D13S68, common to all 3 maps, was taken as the 
map anchor point, and the genetic distance in cM or c* 
of the other markers relative to D18S68 are given The 
marker order is in good agreement with the order of 
the markers on the recently published chromosome 18 
radiation hybrid map (Giacalone et al. (1996) Genomics 
37:9-18 ) and the HI YAC-contig map (http://www- 
genome.wi.mit.edu/). However, a few discrepancies with 
other maps were observed. The only discrepancy with 
the Genethon genetic map is the reversed order of 
D18S465 and D18S477. Two discrepancies were observed 
with the Marshfield map 

(http://wwv.marshmed.org/genetics/). The present 
inventors mapped D18S346 above D18S55 based on 
maternal haplotypes, but on the Marshfield maps 
D18S346 is located between D18S483 and D18S979. The 
inventors also placed D18S817 below D18S979, but on 
the Marshfield map this marker is located between 
D18S465 and D18S477. However, the location of D18S346 
and D18S817 is in agreement with the chromosome 18 
radiation hybrid map of Giacalone et al. (1996). One 
discrepancy was also observed with the WI radiation 
hybrid map (http://www-genome.wi.mit.edu/), in which 
D18S68 was located below D18S465. However, the 
inventors as well as other maps placed this marker 
above D18S55. 

(f) cnrp ana^* ™A ref inement_of_the 

r-.andidate region,.. 



Lod score analysis gave positive results with all 
markers, confirming the previous observation that 
18q2l.33-q23 is implicated in BP disease, at least in 
family MAD31 (De bruyn et al. 1996). Summary 
35 statistics of the lod score analysis under all models 
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are given in table 2 below. 
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The highest two-point lod score (+2.01 at 8=0.0) 
was obtained with markers D18S1113, D18S876 and 
D18S477 under model 1 in the absence of recombinants 
(table 2). in model 1, all individuals with a BP 
5 spectrum disorder are considered affected and fully 
contributing to the linkage analysis. 
Before the fine mapping the candidate region was 
flanked by D18S51 and D18S61, which are separated by a 
genetic distance of 15.2 cM on the Marshfield map or 
10 13.1 cM on the Genethon map. The informative 

recombinants with D18S51 and D18S61 were observed in 2 
affected individuals (11.10 and 11.11 in Fig- 13). 
However, since no other markers were tested within the 
candidate region it was not known whether these 
15 individuals actually shared a region identical-by- 
Ut descent (IBD) . The additional genetic mapping data now 

Jj indicate that all affected individuals are sharing 

!/ alleles at D18S969, D18S1113, D18S876 and D18S477 

i* (Fig. 13, boxed haplotype) . Also, alleles from markers 

£ 20 D18S483 and D18S465 are probably IBD, but these 

! J markers were not informative in the affected parent 

□ I.l. obligate recombinants were observed with the STR 

3 markers D18S68, D18S346, D18S979 and D18S817 (Table 2, 

fig. 13) Since discrepancies between different maps 
25 were observed for the locations of D18S3 46 and 

D18S817, the present inventors used D18S63 and D18S979 
to redefine the candidate region for BP disease. The 
genetic distance between these 2 markers is 8.9 cM 
based on the Marshfield genetic map 
30 (http. //www. marshmed.org/genetics/) . 

(g) Constructor, of tr » contia. 

According to the WI integrated map 56 CEPH 
35 megaYACs are located in the initial candidate region 



JUN 14 2000 10 : 11 



020 7831 1768 PAGE . 32 



TO 0016177202441 



14-JUN-2000 14:56 FROM BOULT URDE TENNANT 



- 33 - 



P. 33 



10 



contained between D18S51 and D18S61 (Churaakov et al. 
(1995) Nature 377 Suppl., De bruyn et al. (1996))- 
From these YACs, those were selected that were located 
in the region between D18S60 and D18S61. D18S51 is not 
presented on the WI map, but is located close to 
D18S60 according to the Marshfield genetic map 
(http. //www. marshmed.org/genetics/) . To limit the 
number of potential chimaeric YACs , YACs were 
eliminated that were also positive for non-chromosome 
18 STSs. As such, 25 YACs were selected (see Figure 
14), and placed in a contig based on the technique of 
YAC contig mapping, i.e. sequences from sequence 
tagged sites (STSs), simple tandem repeats (STRs) and 
expressed sequenced tags (ESTs) , known to map between 
15 D18S60 and D18S61, were amplified by PCR on the DNA 
from the YAC clones. The STS, STR and EST sequences 
used, are described from page 36 to 54. Positive YAC 
clones were assembled in a YAC contig map (Figure 14). 

Three gaps remained in the YAC contig, of 
20 which one, between D18S876 and GCT3G01, was located in 
the refined candidate region- To close the gap 
between D18S876 and GCT3G01, 14 YAC clones (Table 3, 
on page 62) were further analysed. End fragments from 

YAC clones 766.f.l2 (SV11R) , 752.g.8 (SV31L) , 942.C.3 
25 (SV10R) were obtained and sequenced (see pages 55-61) . 
Primers from these three sequences were selected, and 
DNA of each of the 14 YAC clones was amplified by PCR. 
As indicated in Table 3, overlaps were obtained 
between 7 YAC clones on the centroraeric side, and two 
30 YAC clones on the telomeric site (7l7.d.3 and 907. e.l) - 
The final YAC contig is shown in Figure 14. 
In the figure, only the YAC clones which rendered 
unambiguous hits with the chromosome 18 STSs, STRs and 
ESTs are shown. In a few cases, weak positive signals 
35 were also obtained with some of the YAC clones, which 
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likely represent false positive results. However, 
these signals did not influence the alignment of the 
YAC clones in the contig. Although, all YACs known to 
map in the region were tested as well as all available 
5 STSs/STRs, initially, the gap in the YAC contig was 
not closed. However, this was subsequently achieved 
by determining the end-sequences of the eight selected 
YACs (see below) . The order of the markers provided by 
the YAC contig map is in complete agreement with the 
10 marker order provided by the WI map which integrates 

information from the genetic map, the radiation hybrid 
map and the STS YAC contig map (Hudson et al. 1995). 
Also, the YAC contig map confirms the order of the STR 
markers as suggested by the haplotype analysis in 
15 family MAD31. Moreover, the YAC contig map provides 
additional information on the relative order of the 
□ STR markers. For example, D18S55 is present in YAC 

CI 931_g_10 but not in 931_f_l (Fig. 14), separating 

l k D18S55 from its cluster [S55-S969-S1113-S483-S465] 

U 20 obtained by haplotype analysis in family MAD31- The 

K centromeric location of D18S55 is defined by the 

J! STS/STR content of surrounding YACs (Fig. 14). If we 

O combine the haplotype data and the YAC contig map the 

following order of STR markers is obtained: cen-[SSl- 
25 S68-S346]-S55-[S969-S1113]-(S483-S465)-S876-S477-S979- 

S4 66-[S817-S61]-tel. 

Out of the 2 5 YAC clones spanning the whole 
contig, seven YAC clones were selected in order to 
identify the minimal tiling path (Table 4) . These 7 

30 YAC clones cover the whole refined chromosome 18 

region. Furthermore, YAC clones should preferably be 
non-chimeric, i.e. they should only contain fragments 
from human chromosome 18 . In order to examine for the 
presence of chimerism, both ends of these YACs were 

35 subcloned and sequenced (pages 55 to 61) . For each of 
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35 



10 



15 



20 



25 



30 



the sequences, primers were obtained, and DNA from a 
monochromosomal mapping panel was amplified by PGR 
using these primers. As indicated on pages 55 to 61, 
some of the YAC clones contained fragments from other 
chromosomes, apart from human chromosome 18. 



comprising the minimum tiling path (Table 5). These 
three YAC clones were stable as determined by pulsed 
field gel electrophoresis and their seizes correspond 
well to the published sizes. These YAC clones were 
transferred to other host yeast strains for 
restriction mapping, and are the subject to further 
subcloning. 



Three YAC clones were then selected 
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Description of the \uccesfu»y analysed STSs, STRs and ESTs 
covering the refine(\ candidate region. 

Explanations: 

• STS: Sequence Tagged Site 

• STR: Simple Tandem Repeat 

• EST: Expressed Sequence Tag 

These markers are orderedLm the 

markers that were effective* tested and that worked on the YACs are g.ven 



Uf 



Ul 

LI 
W 

S 



Li.. 



List: 

1. D18S60: 

Database ID" AFM178XE3 (Alsl known as 178xe3, Z16781 D18S60) 
SoSce: !we^enbach, GenetLn: genetically mapped polymorphs 
Chromosome: Chri8 * 

Primers: i 
Left = CCTGGCTCACCTGGCA, 
Right = TTGTAGCATCGTTGTrATGTTCC 
Product Length = 157 \ 

Arr ATarTAC AAGANTrTATCCAAAACTGAGATTTCCTTAGAA I a i u io i i 
Si^AnclGTTAAT^ATTCTAnGAAAACATCAAACTTAT 

AAAGCT 1 

Genbank ID: Z1 6781 i . . rantf! , t . 

Description: H. sap,ens (D18S60) OnJA segment conta.n.ng (CA) repeat. 

clone 

Search for GDB entry 

2. WI-9222: 

Database ID; UTR-03540 (Also knownlas G06101. D18S1033, 9222. 

XS3657) \ 
Source: WICGR: Prirr.ers derived from Genbank sequences 

Chromosome; ChrlS 



Primers; 
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Left = GATCCCATAMGCTACGAGGG 
Right = GAGTCTAAAhACAAGAAAGCATTGC 
Product Length = 99 \ 

Review complete sentence: _ fl _ Tr „ CCAAATAAT -rTGAACAGCTTG 

tcttcttaccccttg^ 

CTGCTAMTGGGACCcV^ r G r G ^ 

GAATACGTCAGATOG^ 

GGGGCTAGAAGTTCACOJCCTGACAGTAU^ 

gaataggagaccatttgWc^t 

TATGAGAATTAATAGCG^ 
ATCATTAMTTTGTTTCTAjTTTATTC^ 

CAGAAATGTACTTTCTG^ 
CTTCCATAAAAAMCAA^CGG^CTCG^ 

NNNNNNNNNNNNN^N^NNNNNN^ 

TCTTTCTTTGTGTATTTTATyCAA^ 

CTCSAATGTCttCTGACAGt^ 
TGGTGGAMCTCATGGCnfcTC^ 

ggggacgggagagggcag\g^ 

mAmCTCATTCATGGGG^ 

AGCGGTAAMTCTAGAAGC^G^mACAG^^ CCCATCT 

tgacctgaagtt^ctat^ 

CTTC CTTT C C AATTTTG GTTA mCT G ^ J G _Ir^^ ^ q jjq a q jjtTT AACTT 
NNNNNNNNNNNNNNNGACC A^ CTAAMTTTT CG AC^ 

. a s*-rr* A-r/-> A ATTA&TTAAAGCA 



Genbank ID: X63657 
Description: H. sapiens fvtl mRNA| 
Search for GDB entry 

3. WI-7336: 

Database ID: UTR-C4664 (Also krw 
Sour'ce'wiCGR: Primers derived fro^ Genbank sequences 

Chromosome: Chrl3 



*n as P15, G00-679-135. G06527. 7336, 



Primers: ^~-r. 
Left = AGACATTCTCGCTTCCCTC 
Right = AATTTTGACCCCTTATGGtEC 
Product Length = 332 
Review complete sequence: 
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acacagatagaccnnnAnnnn^ 

CCCATAA^CAATGACA^ 

AAAAATAmATOATTA^GTCAAATT^i^ ^ GATTTGGAAGCT CTTCT 

tcccagcactatgcttt^ 

TPftnTTCCCTGAM GACTGAAGAAA^ 

ctgccctggctccagtgaMcttggg^ 

CCAGAAGTCCTOTCTTA^ 
GAATmGGGGATTTTC^ 
GMCTTCATGGATCAGATCteG^ 

tatgctgcmcaamtgtagW^^ 

ACTOCAAAMCACrrCGTTCbcAbAGCTmCA^ 

ATMGGAATTATAGACCTCTAfeTA^ 

GTTCA(*TCUAATATAAA^^ 

GTCATGTGGTTGGCACTAGA^^^ 

CACAGGGATTCTCACAATAGCj^ 

GTCTCTTCATCTMTATGATAGgGGGM 

TAGAAAATATAAGTAAAGTGATnAAA^G^TCA^ 

TTmCAGTCTATGGGTTTAG^k^^ 
TTMTAGTMTTTGTAAAGTTGGGT^GATAAG 

CATGGATOCTOTCTATAA^TAT^ 

TTCCTTCTCCCATCTCTTCCTTG^^^^ 
TCTGAGATTCAATATTGAATTTCTlCCTATGCTATTbAUAM 

GAACTACC 

Genbank ID: G06527 . 
Description: WICGR- Random genome wide STbs 

4. WI-8145: 

Ottbu. ID: EST,02441 (A,so KnowrUs 01BS12H GOO-677-827, G0684S, 
So 4 jcs T WICGR: STSs derived from dleST sequences 



Chromosome: Chr13 
Primers 



l3[*« GAAATGCACATAACATATAtVtGCC 
Right = TGCTCACTGCCTATTTAATfeTAGC 
Product Length = 134 \ 
Review complete sequence: I 

GTTGTTTGGANGCAGGmAT^ 

acagacatatatatgtgttatgtat^ 

G^TATTGTTTMTGTTTTTTC^ 
GGATACCTACTTAnCTTCAT^ 

aggamttaacagancatc^ 

rtnCAGTGAGCA NTAATTTAAAANCTCACCA I lAlAi/^ 



aaagtaaaag 
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: left and fight primer 



PCR Conditions 

H 0m o sapiens cONA Cone 70692 * similar .0 

ftS Ciuster Caption: Hum. *T *9** h (p ' aSmin ° 96n 
activator-inhibitor 2. pi-2) Search for GOB entry 



S WI-7061: 

Database ,D: UTR-029oU so known as PAI2. G00-678-979. G06377. 7061, 

Source'wiCGR: Pri m ers\erived from Genbank sequences 
Chromosome: Chrl8 

Pn Le e ft = TGCTCTTCTG^SpAACTTCTGC 
Right = ATAGAAGGGCAfrGGAGGGAT 
Product Length = 338 \ 
Review complete sequende: 

T(*GAATTGCTAmCAAA^ 

miGCICJTCJ^AACAACj^^ 

AATtAGACAATTGTCTATl A^J^^^^^^ATTTr 

TCTAAAATGGGATCATGCC^ 

TATMCATTAACTTTTACTTTteTOmATO 

AAAnATTGCTCACTGCCTAT^MTGTAGCT^^ ^ TAATGA 

cctgcttccaaacaacnnnnnVjnnnnnnnnggaattc 

PCR Conditions 
Genbank ID: G06377 

Description: WICGR. Random gerteme wide STbs 



6. D18S63 



Oatabase ,0: ^^^^^^ 



Source: J Weissenta 
Chromosome: ChrlS 
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Primers: \ ^ 

Left = ATGGGAGA&GTAATACACCC 
Right = ATGCTGCTGGTCTGAGG 
Product Length = 285 
Review complete sequence: 

a^catggcTctag^ 

GCNAAATGGTGATCTATC^CCTTCCAG 

De"c%t'° R^ens (01^68) DNA segment containing (CA, repeat; 
clone 

7. W1-3170: 

Database ID: MR3726 (Also krLrn as D18S1037, G04207, HALd22f2, 3170) 
Source: WICGR: Random gendjne wide STSs 
Chromosome: Chr18 

Primers: 1 

Left = tgtgctactgattaaggtaaaggc 

Right = TGCTTCTTCAATTTG|AGAGTTGG 
Product Length = 1 56 
Review complete sequence 

CTGAGACAAGGCAGGCAAAcU^^ 
^WGGCA^fe 

AAAAAAATGGGGnCCTGATTTCf GGATAATAAT r r A ACTCTACAAA o m 
AGAAGCAA CATACCCTCTTTGTTA 

Genbank ID: G04207 

Description: WICGR: Random genortoe wide STSs 



8 Wl-5654: 



Database ID: MR1CSC8 (Also known alD18S1259. GO0-678-695, G05278, 

5654) 1 
Source: WICGR Rardcm genome wide|STSs 

Chromosome. Cnric 
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Pf Tef\ S = CTTAAT G AMAC AATG C C AG AG C 
Right = TGCAAAATGTGGAATAATCTGG 
Product Length = 149 
Review complete sequence: 

TTTTGCAA ATA 
Genbank ID: G05278 

Description: WICGR: Randdm genome wide STSs 
9. D18S55: 

Database ID: AFM122XC1 (aIo known as 122xc1, Z16621. D18S55. 
STSenbach, Gene^on: genetically mapped polymorphic STSs 

Chromosome: Chr18 

Primers: i 

Left = GGGAAGTCAAATGOWVATC 

Right = AGCTTCTGAGTAATG^TTATGCTGTG 
Product Length = 143 

Review complete sequence: \ ^^...^T^A^n-rrrftTArA 

agctgmcatgccttttcatggWgcagtttc^ 

AGCT 1 

Genbank ID: Z16621 i ■ v^/rauMPat' 

Description: H. sapiens (D18S55) Dr^A segment cont»n.ng (CA) repeat. 

clone 

10. D18S969- 

Database ID: GATA-P 18099 (Also knoln as G08003, CHLC. GATA69F01 , 
^ T Hfc 9F g ^S,f lapped po^c ^anudeolide r epea,s 

Chromosome. Chrl 5 



primers: 

Left = AACAAGTGTGTATGGGGGK 



v 
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Right = CATATTOACCCAG 
Product Length = 365 
Review complete sequence: 



- 42 
GTTGC 



CAGGGAAATGCAAATGJ AAAAC CACAATGAGTTATCTC CTCATACCTTTAAT 

AAAACttGMTGTNCGAtokCTCTO^ 
TTATGCAAAACAGTATGAATCTTTATCAGTATA^ 

£ T TcS 

^TGTGGAMCTGAAAMGC^TAC^^ 

ggttaccaggggcmagaggStagamtgaggggagtgaga^^ 

AATCAAAGTGTAAGAATGTTAT^ACATAAATAAATTCATAGAG 

Genbank ID: G08003 » rATA , QCn1 

Description: human STS CHLC.GAT(A69F01.P 18099 clone GATA69F01. 

11. D18S1113: 

Database ID: AFM200VG9 (Also know\as D18S1113, 200vg9, "2403) 
Source: J Weissenbach. Genethon: genetically mapped polymorphic STbs 
Chromosome: Chr18 

Primers: 

Left = GTTGACTCAAGTCCAAACCTG 
Right = CAAAGACATTGTAGACGTTCTQTG 
Product Length = 207 \ 

Review complete sequence: L„..^,^ArtTPTarnrA 
AGCTGCATATAAAACTATTCCATTTCACAT^TTGAAGACA 

TGATACTTTGCTGTTGTCTGTGGGCCACCT\CTTmGAAGTGT^ 

ACTGTGCTCCTOTAATCTQTTOTO re^^ 
GCGTGGCATGmCTNCMCTTGATGTGATC^TATTTATCACTTr 

AGTTAAGTCTCTATGTCTTTGTATTCTTTC^ 
GTGCATGCACACGCATAAACACACACACACAtACACACACACACAGAGA 

PAnAfi ArAnAGAACGTCT ArAATGTCTTTGTQAG 



12. D18S868: 



3, CHLC GATA3E12, 



Database ID' GATA-C13S868 (Also known as G09ii 
CHLC. GATA3E 12.436. CHLC. 496, D18S868) 

Source' CHLC: generally mapped polymorphic Otran^cleotide repeats 
Chromosome: Chr18 
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'Ten 5 * AGCCMTAdcTTGTAGTAMTATCC 
Right = GATTCTCCKGACAAATAATCCC 
Product Length = 189 

Review complete seduence: ATAT ,~ r AT rTATrTTTGATGTATCTAT 
GAGTGAGC£AATA£C>T^^ 
GTATCTATCTTTGTATCTATATGTCTATGTATCT^ 

CTATCttTCTATCTATcW 

ATCTATCTATATCJ^TTTfjHO^ 

hSs C \lC.GATA3E12.P6553 Cone GATA3E12. 



13. WI-9959: 

Database ID: MR12816 (AlsoVnown as D18S1251, G0O-678-S24, G05488, 

9959 ) L w CTCc 

Source: WICGR: Random genome wide 5 1 os 
Chromosome: Chrl8 

Primers: \ 
Left = TGCCAACAGCAGTCAAGC 
Right = AGCACCTGCAGCAG.TAATAGC 
Product Length = 110 \ 

TNATTTTTTTCCTCTGCATTATA^TTAC 

Genbank ID: G05488 . 
Description: WICGR: Random gendme wide 5T5S 
Search for GOB entry 

14. D18S537: 

Database ID, CHIC GATA2E06.13 <a\so known as CHLC.13, GATA2E06, 

ffi ESS-** pV orphic ,elranucle °' ide repeats 

Chromosome: ChriS 

Primers: ^J^.-r^ 
Left = TCCATCTATCmGATGTATQTATG 
Right = AGTTAGCAGACTATGTTAAT^CAGGA 

Product Length = 151 
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Review complete sequehc* nTftftftTftT rr;.AT r f TATCTTTGal6 I 
AAAGCTGAGTGAGCCAAl£CCTjG^ 

aj^tatgtatctatctttgYatctatat^ 
atctatctatcatctat^ 

CTATCTATCTATCTATATC<^^^ 

S^STsTS CHLc\gATA2EC 6 .P60C 6 Cone GATA2E06. 
Search for GOB entry 

15. D18S483: 
Chromosome: Chrl8 

Primers: .^.j,^ 

Left = TTCTGCACAATTTCAATAGATTC 

Right = GAACTGAGCAAACGAGTATGA 
Product Length = 214 \ 

Review complete sequence: L,. T ^ a r * A-r-rrr AATAGATTCC 

AGCTCTGCTGGAAGAG^ 

CCTACCCTGGGTTmCACT^^ 
TAGATAGATAGATAGATAGATA^ 
ATATATAGTATATAAAATCTACA^C^ 
TTTGCCTTTCCTTGACTATr OTftrTrnpTfiCTCAfa 

atttttgtttgtaaatccaaaatgctt 

SSS^H 2 ^™ (0168483) DMA s^en, =c nl ainin 3 (CA, repeat; 
clone 

Search for GDB entry 
16. D18S465: 

Chromosome: Chr18 

Pn L e ft = ATATTCCCCTATGGAAGTACAG 

Right = AAAGTTAATTTTCAGGCACTCT 

Product Length = 232 

Review complex sequence: ^X, -r-r^/^rrTA TGGAAGTA 

AGCTCTGTCCCTCTAGAC^^ 

C AG AT G G TTTTT N T AAAAT AAATTT AT C TG ATT QT G A ^ 
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tttttatgttcagtgtttYtctaaatt^^ 
aaatggtttttaaatatgcacatatgtgcatattttacacacacacacaca 
cacacacacactctcmatttagaagcattatagataga^tg^ctg^aaa 
ttaacttt taaccnaagaaftagacaataaggaacaatagggaagttatcc 
tttgctaagggtatggaa^tattcacatattatttataacangttaaacc 
aagtcatgcttgantataa^agct 

Genbank ID: 223850 

Description: H. sapiens (Dl8S4p5) DNA segment containing (CA) repeat; 
clone 

Search for GDB entry 



17. D18S968: 

Database ID: GATA-P34272 (Also\known as G10262, CHLC.GATA1 17C05. 
CHLC.GATA1 17C05.P34272) 

Source: CHLC: genetically mapped\polymorphic tetranucleotide repeats 
Chromosome: Chr18 



4} 

Ijjl 



Primers: 

Left = GAAATTAACCAGACACTCCiTAACC 

Right = CTTAGAATTGCCTTTGCT^C 

Product Length = 147 

Review complete sequence: . 
GAATAAAAATATG AG GTATTAG AAATTTAC AGATAGGAAGjAAATT^ACCAG 
ACACTCCTAACCA CCGATNAGTTTAAAGAGGAGATAGATAGATAGATGAT 
AGATAGATAGATAGATAGATAGATACqACTGAAAATGCAANCACAAATTA 
nCAfiATTATATnTGAT GCAGCAAAGGteAATTCTAAG TAGATTCTAACTGC 

tacattgatagcagtacccactgacat\taccggaaaggatggtatccata 
accacctacctatatacctccgcagctgganattaggnttaagcttcttn 
gggcncctggcggccccnnttgtggtpcccggtnggnccccgnttnn 
gnntngctnngnttncnttggngnccdccnntnggtttnnggnnnnnt 
nnnnntngnnnnnttncccnnnnnnnnYntnnnncnnnnnnnnntnnn 

nnnnnnnnnnggnnnnngggn 



Genbank ID: G1 0262 
Description: human STS CHLC.GATA117C05.P34272 clone GATA117C05. 



18. GATA-P6051: 

Database ID: GATA-P6051 (Also known as CH^C.GATA3E08, 
CHLC. GAT A3E08.P5051 ) 

Source: CHLC: genetically mapped polymorphidjtetranucleotide repeats 
Chromosome: Chr13 



Primers: 

Left = GCAACAACCCTAATGAGTATACG 
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Right * G AGTCTCAOC AG G G CTTAC A 
Product Length = 149\ 

^T^GCACCCAGA^^^^^ 
ACGATATCTATCTATCTAiCTATCTATCTATC^ 

T 

KKtS CHL(\.GAf A3E08.P6051 done GATA3E08. 
19. D18S875. 

Debase ID: GATA-D18S875 (Aso known as G08001. CHLC.GATA52H04, 

Sour S ce ? CHLC: genetically mapp\d polymorphic tetranucleotide repeats 
Chromosome: Chr18 

Primers: i _ 

Left = TCCTCTCATCTCGGATAIGG 
Right = AAGGCTTTCAGACTTAQACTGG 
Product Length = 394 
Review complete sequence: \__ A -- A A-rTTrrTTTAATGGCNANG 

ttatttattcactcattcmtaaatAtttatgaatttc^ 
naccnaagaanctgctccctgtnaaactngag^ 

GTATAGNTCCAGTCCNAAGTCTANAGAC^ 

AGCCTTATTTTCTGCAACATTGTTCTATTCAGACCCTTNANANGATTGACN 
ATGTCCACCCA 

Genbank ID: G08001 u„ fl rAT^7Hfi4 
Description: human STS CHLC.GATA52H04^P16177 clone GATA32H04. 

Search for GDB entry 
20. WI-2620 

Database ,0: MR'.*3S (Also known as G036ol. D18S890. HHAal2h3. 2620) 
Source: WICGR. Random genome wide STSs 
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Chromosome: Qhr18 



Primers: \ 

Left = TCTCC/UGCTATTGATTGGATAA 

Right = TTAAGmGCCAATTTATATAAAAGCAGC 
Product Length V 177 

tcttcttgagcctcYcattctgTg^ 
gacacacaaatatctgactcaaggaamggaa^ 

n Ar^TGCTTTTAT ^^ATTr,nnTCTTAMCTTTCTAAGTTTATTATGGAT 



Genbank ID: G03602 

Description: WICGR: Random genome wide ST5S 
Search for GDB entry 



21. Wl-4211: 

Database ID: MR6633 (Also iLwn as G03617, D18S980. 4211) 
Source: WICGR: Random gerlpmewide STSs 
Chromosome: Chr18 

Primers: » 

Left = ATGCTTCAGGATGACGTAATACA 
Right = AAATTCTCGCTGATTGGAGG 
Product Length = 113 

GGCmGTATGTCTTCAAAGTGA^ArTTTmAAGTArTACTTGTCC^TCC 
AATCAGCGAGAATTT 

Genbank ID: G036 17 

Description: WICGR: Random genorrie wide STSs 
Search for GDB entry 

22. D18S876: 

Database ID: GATA-D18S876 (Also knoln as G09963, CHLC.GATA61E10, 

Source^CHLC: genetically mapped polymorphic tetranucleotide repeats 

Chromosome: Chr13 

Primers' \ 
Left = TCAAACTTATAACTGCAGAGAAOp 
Right = ATGGTAAACCCTCCCCATTA 
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Product Length 



Review complete Sequence: ..^/s* 

^ ^finrnr a att A ^ttt^<- AT^AAACTTATMCTGCAOAqAAC.QTTGCC 

C ACTATTTTATAC C AC AC AAC AGTATTC TTAG C C AGATTAC ATCTATCTAT 

CTATCTATCTATCTATCTATCXATCTATCTATCTATCTATCATCTATCTAGC 

TAr;rTATrTATrTATAr;AAr. TAATGGGGAGG GTTTACCATGTTTGGGT^GA 

ACCCAAACATTTTATGGNCAAGGGNTTGGAAMTTACCCTTATCTACA^ 

TNTTNAACTTGTTTTGGTAGGNGTGNTAATTCCNTGGGNTTGGAANAACT 

TTTGNAATTTCCTCNmGTTTNTNATTNNNI^TTNNTNNNCATTATTNTGG 

ggtnttcngggtggagggctnantttggccncccgggtccnnc^ngc 

NAGTNGGNNNGGNTNNTNGGGTTTNCTTGGGAANCNTNCCNCCTNCNG 
GGGNTTCANGGGNT7%TNTTTNNTTG 



Genbank ID: G09963 . ^.t^^a 
Description: human STS cVlC.GATA61E10.P17745 clone GATA61E10. 

Search for GDB entry 



Li 



23. GCT3G01: 

Database ID: GCT-P10825 (/%(so known as G09484, CHLC.GCT3G01, 
CHLC.GCT3G01.P10825) 

Source: CHLC: genetically mapped polymorphic tetranudeotide repeats 
Chromosome: Chr18 



Primers: 

Left = CTTTGCAATCTTAGTTj 
Right - GAACTATGATATGGA^ 
Product Length = 128 
Review complete sequence: 

AGATGTTTAACTTTGCAATCTTAJ 



TTGGC 
TAACAGCG 



CCACAACTTTTATTC GATATT7 



iTTAATTGGC AGAAATGAAATTTAGTTT 
kCACCACCACCATCAGCAGCAGCAGC 



■CCATATCATAGTTCA GAGCATTTAAA 

PAA 



AGCAGCAGCAGCATCGCJ^TTACv 

GNGGTCAAAATATACAACTAGGCT\GACACCNGNATAAGGTTTAATT 

ACCNGNGGTCTNCCCTCTMGGNQGNTTTT TTTTTC TTGNCNTGGCTTCT 

TTTTCCNTTTGCTTTTGTAAAATATQMGGNATTTTTGGGTTNTTCNTGGN 



ANTTNNCNNANTNNTNNTTNNNCNq 
CCCNNNTTGCCCCGGGGTTGNGTGi 
NAAGTTTNGGGGCCCT 



^GGNATT 

JCCCCCCNTTTGNGGCGGGGGTC 
JAGTAGGGGGGTCNCGGGTNNNG 



Genbank ID: G094a^ . 
Description: human STS CHLC GCT3G0llP10825 clone GCT3G01. 



24. WI-528: 

Database ID: MH232 (Aiso known as G035< 
Source: WICGR; Random genome wide STJ 
Chromosome: ChrlS 



3, 528, D18S828) 
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Primers: \ 

Lett = TTCTGCCVrTCCTGACTGTC 
Right - TGTTTCOpATGTCTTGATGA 
Product Length ~ 2\1 1 

Genbank ID. G03589 . 
Description: WICGR: Random genome wide ST&s 
Search for GDB entry 

25. WM783: 

Database ID: MR432 (Also known as G03587. _shu_31.Seq, 1783. 

D18S824) \ 
Source: WICGR: Random genbme wide STSs 
Chromosome: Chr18 

Primers: , _ 

Left = CCAGTAATTAGACATuGACAGGTTC 
Right = TTTTACTAGACAGGCjTTGATAAACAA 
Product Length = 305 

CTGCTGCTTTTTGGGTTTCCTT^ 
CMTGGATAGTAAATAAmGTAl[G^ 

G AATAAG G G AAC AAC AATC AAG uAC ^^^^ ^^^^^I^^pTX^a 
AOTmGAGCTTTTGTAAAAAAG?^^ 

mCTAATCTCCmACAATTTTTTWTTnTTT^TnAAnr.CTGTCTAblAA 
AAATAATTCAGTTTCGGAATGTGG) 

Genbank ID: G03587 

Description: WICGR: Random genome^ide ST5>s 
Search for GDB entry 



96. D18S477: 

Database ID: AFM301XF5 (Also known asVoixfS, Z2 ™*™SW 
Source: J Weissenbach. Genethon: genetically mapped polymorphic STSs 
Chromosome: ChrlS 

Primers: 

Left = GGACATCCTTGATTTGCTCATAA 
Right = GATTGACTGAAAACAGGCACAT 
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Product Length = 243 

gcac^ta^ctc^^ 

rTm^ ATTTTGCTA^AAAGAGAGAAAACTAACAGA 

TT^G^^A^XA^ATgVaTTG ^^^'^^JJJ7?^JS^J^jq^q^y^ 
GGGCTAAGGAGAGTGACATCTGGGCTACATTAAAAGGACAGTCACATTG 

CTCAAAGNACTCAAGTTTAGCCCGAGTACAGTAGCT 

S^oreK Sapiens (uM8S477) ONA segment containing (CA) repeat; 
clone 

Search for GDB entry 



ul 



■as a: 



27. D18S979: 

Database ID: GATA-P2808o\aIso known as G08015, CHLC.GATA92C08, 
CHLC.GATA92C08.P28080) \ rQ „ aa tc 
Source: CHLC: genetically mapped polymorphic tetranucleotide repeats 

Chromosome: Chr18 

Primers: . 

Left = AGCTTGCAGATAGCOTGCTA 
Right = TACGGTAGGTAGGT^GATAGATTCG 
Product Length = 155 

Review complete sequence: i ^..^.^i^rTPAA 
CTCTACAGTCTCTNACCTTTGGkcTCCAGGACmCAC^ 

CATTCCCACTGGGTTCTCAGGA^TTTATAGTTGTACTGAGCC^ 

GGATCCTAGGGTCTCCAGCTIGbAGAJ^^ 

nGTMTAAGGTGAGTCAATTCTbcCAATAAACCiA^^ 

atctatctatctatctatctatctXtctatatctatcatcw 

rTATCTACCTACCTACCGTA TTAQ^TTCTGTCTCTCTGGAGN 

Genbank ID: G08015 \ ^ .. TAQirnfl 

Description: human STS CHLC.GATA;92C08.P28080 clone GATA92C08. 

28. WI-9340: 

Database ID: UTR-05134 (Also known !as G06102, D18S1034, 9340, 

X60221) \ 
Source: WICGR: Primers derived from Genbank sequences 

Chromosome: Chr18 

Primers: \ 
Left = TGAGAGAACGAAATCTCTATCQG 
Right = AGGCAGCAAGTTTTTATAAAGyC 
Product Length = 115.. 
Review complete sequence: 
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ATGTATCTATCCCMTTGAGTCAGCTAGAAACAGTTGACTGACTAAATGG 
AftTTTArcCTTCCTAAAAATGAAAAGTTTGTTTCATATAGTGAGAQAACGA 

A^TrOTTCCGTGATC AftTCTGAATAAG 

G C CT G A C T AAAG ATT AAC A G G TTATA GTTT AAATTT G TAATTAATTCTAC C 
ATCTTGCAATAAAGTGAOAATTGAATG 

GenbanklD: G06102 

Description: WICGR: Randortj genome wide STSs 
Search for GDB entry 



29. D18S466: 

Database ID: AFM094YE5 (AlsoVnown as 094ye5, Z23354 D18S466) 
Source: J Weissenbach, Geneth<\n: genetically mapped polymorphs STSs 
Chromosome: Chr18 

Primers: 

Left = ACACTGTAGCAGAGGCTVTGACC 
Right = AG G C C AAGTTATGTG C C AC C 
Product Length = 214 

Review complete sequence: M ^ Mnn 
nnit3 ^.o^ t tf^, :ioao<7 > aa r a r>qf a ncaoa \]octtQacc accacccagttctcactagcactgagg 

atgctctattggttgggttacccacacacgcatagaiatgcacacacacagacacacagacacacacac 

acacacacacacaccagatatagcattccaaacckcaatatgctatgcaatactgcattaacaggtc^g 

rrtn tnntnnracataacttQocct aQaaaatactgkgacgtctgcattcccttUattatcgaattgacttact 

tggcttctgagttttcctcagaagtaatacttcaatacci^ttccatttctgccttgancattgtttggggtaccaag 

tatagct 

Genbank ID: 223354 \ 

Description: H. sapiens (D18S466) DNA segment containing (CA) repeat, 
clone 

Search for GDB entry 



30. D18S1092: 

Database ID: AFMA1 12WE9 (Also known as\Dl8S1092, w5374, a112we9) 
Source: J Weissenbach, Genethon; geneticaty mapped polymorphic STSs 
Chromosome: ChM3 



Primers: 

Left = CTCTCAAAGTAAGAGCGATGTTGTA^ 
Right = CCGAAGTAGAAAATCTTGGCA 
Product Length = 153 
Review complete sequence: 
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Search for GDB entry 
31. D18S61: 

Chromosome: ChM8 

Pr Uf[ S = AmCTAAGAGGACTCCiMACT 
Right = ATATTTTGAAACTCAGGKGCAT 
Product Length = 174 \ 

Review complete sequence: \ rr ^ AAAAT TTCTAAGA 

CGTCTOCCAAACCAACAT^TATA^^^ 

ftttACTCCCAAAClA^^^ 
NA^^AC^ 

CCCTTCAAATCNTAGCATAMTTCC^^^ 
r. A r,TTTCAAAATAT TGGGTGGTTCGAAGTTCGAAU^MM 

TAGTGTCTATTANTTGTTGGACAGCTl 

Si- (D18S61 , DMA s^ent centring (CA, repeat 

done 

Search for GDB entry 
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Markers (STRs) used iV. refining the candidate region. 

Below the markers are showV that 

candidate region. Most cf^^^^J^SS^ markers, 
will therefore only be mentioned to by their name, ror 
the information is given here. 

New data: 
1 D18S51: 



Other names: UT574, (D18S379) 

Primer sequences: ^™-JU-n~rrArTrt 
UT574a GAGCCATGTTGATGCCACTG 
UT574b CAAACCCGACTVVCCAGCAAC 

^Sagga 

TGTCTCTACAAAAAAATACAAAAATNIAGTC 
GTAGTCTCAGCTACTTGCAGGGCTaAGGCAGGAGGAG^^ 

GAAGGUAAGGCTGCAGT^^^ 

gagtgacamttgagaccttgtctx^ 

GAAAGAAAGAAAGAANGAAAGAAAG^ 

ACACACCAGAGMGTTAAmTAAm^ 
CCAACATGTCCACCTTAGGCTGACGGt^GT^ 

G CTAAAGTG AGCTT AAT G CTGATC G AC T&TAG AG 
GENBANKID: L18333 
7 D18S345. 
Other name: UT575 
Primer Pairs: 

Primer A: TGGAGGTTGCAATGAGCTG 
Primer B: CATGCACACCTAATTGGCG 
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GenBank ID: L26588 
3 P18S817. 
Other name: UT6365 
Primer Pairs: 



Primer A: GCMAGCAGAAGTGAdCATG 
Primer B: TAGGACTACAGGCGTGYGC 



DNA Sequence: \ 

CATATGGGTCCACMG^ 
TCTACTGAGGGNCATAAGGCAGM^ 

CAGAAGTGAGCAIGJA^^ 
AAGAAMGMGGAMGTTC^ 
GCAGAAGGATTGCTTGAGCC^ 
GTGAGACTCCATCTCTGCATA^ 

™g£S 

m Kixri AfZCC ATGATC \ 



N NTG AG C C ATGATC 
GenBank ID: 130552 
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Characterisation of YXcs. 

a YACs were selected coveting the candidate region and Hanking the gap. 
^KJSSXS characterised by determining the end-sequences 
by the Inverse-PCR protocol 1,752_g_8, 
Selected YACs: 961_h9. 94p_c_3, 766 J J 2, 73i_cy. 
7l7_d_3, 745ji_2 

New STSs based on end-sequences (unless indicated otherwise, the STSs 
^ ^^^odhnn!^ mapping pannel for ident.fy.ng 

of the YAC; if the s\s reveled a hit no on chromosome I8q 
chimaeric YAC- then it is indicated in the text below). 



1 SV32L. 

Derived from YAC 745_d__2 left arrh end-sequence. 

Primer A: GTTATTACAATGTCACClCTCATr 
Primer B: ACATCTGTAAGAGCTTClACAAACA 

DNA-sequence: 

CjCTIACAGATGTrCTTAAGTAA^ ^ 
ACTACACATATTTATCAATAATAGTTOACAAATACATTTTCAAATT 

Amplified sequence length: 107 basepairte (bp) 
This STS has no clear hit on the moncchr\mosomal mapping pannel. 



2. SV32R. 

Derived from YAC 745^d_2 right arm end-sequence. 

Primer A: ACGTTTCTCAATTGTTTAGTC 
Primer B: TGTCTTGGCATTAi m nAC 

DNA sequence: 

AGACMTGGGAGAMTTGCACJGC<^ 

C CATAC AGCTG C C G^ATGTGATC AT^GCAAGTCA^^^ i 

TTTAGTCATTTGTAAGACAAAAAGACTGGTTGG^ 

ATr5rTCCTTCAGGTTTAACAAGCAATAMTGATACTCTTCAGT 

^^^?^GACmAAATTAAA^CCAAA<^AGATATC 

Amplified sequence length: 127 bp 
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This 



STS has no deal hit on the monochromosomal mapping pannel. 



3. SV11L 

Derived from YAC 766V_12 left arm end-sequence. 

Primer A: CTATGCTCTGATCTTTGTTACTTT 
Primer B: ATTAACGGGAYVAGAATGGTAT 

DNA sequence: 

AATGTAGCAGTTA 

Amplified sequence length: 1 18 bp 

This STS has a hit with chromLome 18 and must ba located between 
CHLC.GATA-p6051 and D18S968. 



A SV11R. 

Derived from YAC 766JJ2 righlarm end-sequence. 

Primer A: AAGGTATATTATTTGTgTCG 
Primer B: AAACTTTTCTTAACCT<^ATA 

ONA sequence: 

ATGAGGTTAAGAAAAGTTTA 
Amplified sequence length: 119 bp. 

This STS has a hit with chromosome l\s and must be located between 
D18S876 and GCT3G01 



5. SV34L 

Oerived from YAC 7l7_d_3 left arm endAsequence. 

Primer A: TCTAC ACATATGG GAAAG C AffiG AA 
Primer B: GCTGGTGGTTTTGGAGGTAGy 
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DNA sequence: 

ACATAAAATGTCGCTCA^AACAATTATGTOTGIS^Q^^^^^^^^ 
G^AGGA^CAAATTTGTTTACAACATACATTAC^ 

^y^T^"^^jrrTflrrTrn ^AJACCACCAGCA CNGTCCGCAATAACTATAC 
ATC 

Amplified sequence length: 93 bp 
This STS has a hit with chromosome 18. 



6. SV34R. 

Derived from YAC 717_d_3 right Wm end-sequence. 

Primer A: ATAAGAGACCAGAATGTGATA 
Primer B: TCTTTGGAGGAGGGTAGTC 

DNA-sequence: 

AATATC ATTC TTC AC C C AC GTTAt\c ^^^^^'^^^^^^ ( ^7^qI 

catctcacatggaaaaatctgctgVgatcagttcctga^cttgct^ 

TCCTCCCTTAGGAMGTAQA>VAAAtCTTTTTGA^ 
CAATGAAAATTAGGTGAAGCTACAGAAGCCAGAAATT^ 

ACAATTATTT/VAGANGACCAATTGT^mGGTC 

Ar.TAr^CTCCTCCAAAGAA TTCACT^GCCGTCGTTTTACAACGTCNTGA 

Amplified sequence length: 244 bp 
This STS has a hit with chromosome 1 , therefore YAC 717_d_3 is chimaeric 



7. SV25L 

Derived from YAC 731_C_7 left arm end-sebuence. 

Primer A: AAATCTCTTAAGCTCATGCTAGTG 
Primer B: CCTGCCTACCAGCCTGTC 

DNA sequence: 

agtggagagatagaaagagaggaagatttVt^ 
catgctagtg taggtgctggcaggtctg4cactctgtag gacaggctg 

gtaggcagg aa 

Amplified sequence length: 72 bp 

This STS has no clear hits on the monochromosbmal mapping pannel. 
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8. SV25R. 

Oerived from YAC 731 Jj right arm end-sequence. 

Primer A: TGGGGTGCGGTGTGTTGT 
Primer B: GAGATTTCATqCATTCCTGTAAGA 

DNA-sequence: . 

CATGAAATCTC CC ANC C C CfCTTGTTG GAAATTTC C CTCAC TTT 
Amplified sequence length: 136 bp 

This STS has a hit with chromosome 7; therefore YAC 731_c_7 is chimaeric 
9. SV31L 

Derived from YAC 752_g_8 left akn end-sequence. 

Primer A: GAGGCACAGCTTACCAGTTCA 
Primer B: ATTCATTTTCTCATTTTKTCC 

DNA-sequence: 
CTTCTCNATGANTGGACAMTGTaA^GGGTCAG 

GTAATTG C GTC ACTTT G G AG G AATTATTTG AC ^TlXL ^^^p^r GATAA 

Sac^c^tgagggtgaagttagW^ 

AATfiAGAAAATGAAT TNAGTGCTTAAGACAATGCTTGGTAACTACj i 
CCG 1 
Amplified sequence length: 178 bp 

This STS has a hit with chromosome 1^ and must be located between 
D18S876 and GCT3G01. 



10. SV31R. 

Derived from YAC 752_g_8 right arm end^sequence. 

Primer A: CAAGATTATGCCTCAACT 
Primer B: TAAGCTCATAATCTCTGGA 
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DNA sequence: 
AAACTTTAACCAA1 




GTTTCCTGG 

Amplified sequence lengthi 131 bp 

This STS has no clear hits qn the monochromosomal mapping pannel and 
gives no 



information concerning the chimaerity of the YAC. 



11. SV10L 

Derived from YAC 942_c_3 left arm end-sequence. 

Primer A: TCACTTGGTTGGTtUcaTTACT 
Primer B: TAGAAAAACAGTTGQATTTGATAT 

DNA-sequence: 

Amplified sequence length: 130 bp 

This STS has a hit with chromosome \a and must be located between 
CHLC.GATA-p6051 and D18S968 

12. SV10R- 

Derived from YAC 942_c_3 right arm enti-sequence. 

Primer A: AACCCAAGGGAGCACAACTC } 
Primer B: GGCAATAGGCTTTCCAACAT 

T^GGTScCTAGGm 

catgtgcaggtctccgtgtggacatmtV^ 

ATTTAAAAAACCTGGG^ 

ftTr rrp)»r^*^^ T ^-™ r - AAAnfiCTA ^ c - c - ANCAT 

Amplified sequence length: 135 bp 
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This STS has a hi\with chromosome 18 and must be located between 
D18S876 and GC"li3G01 

13. SV6L 

Derived from YAC 9S\l Ji_9 left arm end-sequence. 

No primer was made, because this sequence is identical to a known STR 
marker D18S42, which\is indeed mapped to this region. 

Primer A: 
Primer B: 

DNA sequence: 

GTTGCGGTTGTCACTTGGT^AACAAAATAAGTC 
Amplified sequence length: 

SV6L recognises 018S42 whicV must be therefore located between WI-7336 
and WI-8145 



14. SV6R. 

Derived from YAC 961_h_9 right Wm end-sequence. 

Primer A: TTGTGGAATGGCTAAQ3T 
Primer B: GAAAGTATCAAGGCA<|TG 

DNA sequence: 

tmttoacmatamaat^^^ 



TTCCATATTATTTACTTTNNGC 



TIM. ' ^ 
CCCTTNCATCTAACAAATATATATTCAGT I TCTATAATGTGTCTGACACTG 



Amplified sequence length: 122 bp 

SV6R amplifies a segment on chromosome 18. This segment must be located 
between WI-2620 and WI-421 1 
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15. SV26L. 

Derived fror\ YAC 907_eJ left arm end-sequence. 

Primer A: TAtTTGGTTTGTTTGCTGAGGT 
Primer B: CaAgaAGGATGGATACAAACAAG 

DNA sequencer 

^Tr^g^G GGMGGCTTTACAGGCATTCAAAAGG 

Amplified sequence length: 154 bp 

This STS has a hit with\chromosome 13; therefore YAC 907_e_1 is 
chimaeric 



16. SV26R. 

Derived from YAC 907_e \ right arm end-sequence. 

Primer A: CGCTATGCATGGATTTA 
Primer B: G CTG AATTTA G GJ^TGTAA 

DNA sequence: 
C^CWGCAIGGAIIIAM 

TCTTATTCTAG GTTC C T AATATTTA r AJC, CT AAATTC AG CT 
Amplified sequence length: 90 bjj 

no clear hits on moncchromosom^l mapping pannel: no information 
concerning chaemerity at this sidd of the YAC 
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Testing of 3 end-sequences flanking the gap in additional YACs: STS- 
markers WM21 1 . D1 8S876 and GCT3G01 are also shown in order to 
identify YACs on opposite sides of the gap more clearly in table 3 below. 



10 





STSs 


YACs 


WI-4211 


D18S876 


SV31L 


SV11R 


SV10R 


GCT3G01 


940_b_1 
766 JJ 2 
846~a~5 


+ 
+ 
+ 


+ 

- ? 


+ 
+ 
+ 


+ 
+ 






752~0_8 


+ 




+ 








745_d_2 


+ 


+ 


+ 


+ 






96l"cJ 
942 C 3 
717 d 3 
972 eJ1 
940JM0 




+ 
+ 


+ 


+ 
+ 


+ 

+ 


+ 

+ 
+ 


821~e~7 










+ 


+ 


731~c 7 














889~<f4 
907 e 1 








+ 


+ 
+ 


+ 



20 



. +: positive hit / -: no hit / ?: 2 instances were observed in which a positive 
hit was expected (on the assumed order of the markers) but not 

2 5 observed. The reasons for this are not clear 

YA- . 745.d.2 was excluded from further analysis since there was no clear 
r t with chromosome 1 8. Of the remaining 7 from a monochromosomal 
mapping panel it was determined that 3 were chimeric and 4 non- 

3 o chimeric as shown in Table 4 below. 
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TABLE 4 



YAC 


chimaeric 


s 961_h_9 (6) 


no 


942_C_3(10) 


no 


766J_12(11) 


no-... 


731_c_7 (25) 


yes 


907_eJ (26) 


yes 


io 752,g_8 (31) 


no 


717 d 3 (34) 


ves 



chromosome 



chromosome 7 
chromosome 13 

chromosome 1 



15 



20 



25 



30 



For the non-chimeric YACs the STS based on the end- 
sedquence flanking the gap (1 OR, 1 1 R. 31 L) was tested 
on 14 YACs flanking the gap. Overlaps between YACs 
on opposite sides of the gap were demonstrated: e.g. the 
"11 R" end-sequence (766.f.12) detects YAC 766 f .12 
and YAC 907.e.1 . 

YACs were then selected comprising the minimum tiling 
path: 



• 


TABLE 5 




YAC 


size 


chimaerity 


961_h__9 


1180 kb 


not chimaeric 


766JJ2 


1620 kb 


not chimaeric 


907 e 1 


1690 kb 


chimaeric (chr. 13) 



35 



These three YACs are stable as determined by PFGE 
and their sizes roughly correspond to the published 
sizes. These YACs were transferred to other host- 
yeast strains for restriction mapping. 
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15 



20 



^vptari Rental 2 
Construction of f r acrmentati on vector: 

A fc.5kb ECORI/Sall fragment of pBLCS.l (Lewis et 
al, 1992^ carrying a lysine-2 and a telomere sequence 
was direitionally cloned into GEM3zf (-) digested with 
ECORI/SaA- Subsequently, an End Rescue Site was 
ligated iAto the EcoRI site. Hereto, two 
oligonucleotides ( strand 1: 5'-TTCGGATCCGGTACCATCGAT- 
3' AND STRAND 2: 3 ' -GCCTAGGCCATGGTAGCTATT-5' } were 
ligated intd a partial (dATP) filled ECORI site, 
generating tL vector pDFl. Triplet repeat containing 
fragmentation^ vectors were constructed by cloning of a 
21bp and a 30bp CAG/CTG adapter into the Klenow-f illed 
PstI site of pWl. Trasformation and selection 
resulted in a \cAG) 7 and a (CTG) 10 fragmentation vector 
with the orientation of the repeat sequence 5' to 3' 
relative to the\ telomere. 

Yeast transformation: 



3- 



Linearised (digested with Sail) vector was used 
to transform YAC clones 96lh_9, 766. f. 12 or 907 e.l 

25 using the LiAc method- After transformation the YAC 
clones were plated onto SDLys" plates to select for 
the presence of the fragmentatio vector. After 2-3 
days colonies were replica plated onto SDLys" -Trp'-Ura" 
and SDLys'-Trp'-Ura* plates. Colonies growing on the 

30 SDLys' -Trp'-Ura' plates but not on the SDLys" -Trp'-Ura' 
plates contained the fragmented YACs. 



35 



Analysis of fragmented YACs : 

Yeast DNA isolated from clones with the correct 
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phenotype was analysed by Pulsed Field Electrophoresis 
(PFGE) , followed by blotting and hybridisation with 
the Lys-2 gene and the sizes of the fragmented YACs 
were estimated by comparison with DNA standards of 
known length. 



End Rescue: 



Fragmented YACs characterised by a size common to 
other fragmented YACs, indicative of the presence of a 
major CAG or CTG triplet repeat, were digested with 
one of the enzymes from the End Rescue site, ligated 
and used to transform E. Coli. After growth of the 
transformed bacteria the plasmid DNA was isolated and 
15 the ends of the fragmented YACs, corresponding to one 
of the sequences flanking the isolated trinucleotide 
repeats, were sequenced. 

Sequencing revealed that fragmented YACs of an 
equal length were all fragmented at the same site. A 
20 BLAST Search of the GenBank database was performed 
with the identified sequences to identify homology 
with known sequences. The complete sequence spanning 
the CAG or CTG repeats of the fragmented YACs was 
obtained by Cosmid Sequencing, employing sequence 
25 specific primers and splice primers, as previously 
described (Fuentes et al . 1992 Hum. Genet. 101: 346- 
350) or by using the "genome walker" kit (Clontech 
Laboratories, Palo Alto, USA) and described in Siebert 
et al. Nucleic Acid Res (1995) 23(6): 1087-1088 and 
30 Siebert et al. (1995) CLONTECHniques X(II): 1-3. 



Results i 



A YAC 961. h. 9 clone was transformed with the 
35 (CAG) 7 or (CTG) 10 fragmentation vector. The CTG vector 
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did not reveal the presence of any CTG repeat. 
Analysis of twelve (CAG) 7 fragmented YACs showed that 
five of these had the same size of approximately 
lOOkb. End Rescue was performed with ECORI and 
sequencing of three of these fragments revealed that 
they all shared the terminal sequence shown in italics 
in Figure 15a. A BLAST search of the Genbank database 
with this sequence indicated the presence of a 
sequence homology with the CAP 2 gene (GenbBank 
accession number: L40377) . The sequence spanning the 
CAG repeat shown in Figure 15a was obtained by both 
cosmid sequencing and genome walker sequencing. The 
jj sequence was mapped between markers D18S63 and WI-3170 

Ul by STS content mapping. 



A YAC, 766-f-12 was fragmented using the 
(CAG) 7 or (CTG) 10 fragmentation vector. Again the 
(CTG) 10 vector did not reveal the presence of any CTG 
repeat. Analysis of twenty (CAG) 7 fragmented YACs 
20 showed the presence of two groups of fragments with 
the same size: five of approximative ly 650kb and two 
of approximatively 50kb. 

End Rescue was performed using ECORI on four of 
the fragmented YACs of 650kb. Sequencing confirmed 
25 that they all shared identical 3' terminals, 

characterised by the sequence shown in italics in 
Figure 16a. A Blast Search showed homology of this 
sequence with the Alu repeat sequence family. The 
sequence spanning the CAG repeat shown in Figure 16a 
30 was obtained by cosmid sequencing. The sequence was 
mapped between markers WI-2620 and WI-4211 by STS 
content mapping on the YAC contig map. 
End Rescue was also performed on the two fragments of 
50kb. Sequencing revealed the sequence shown in 
35 italics in figure 17a. A Blast Search revealed no 
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sequence homology with any known sequence. Cosmid 
sequencing allowed to identify the complete sequence 
spanning the CAG repeats, shown in figure 17 a. The 
sequence was mapped between markers D18S968 and 
5 D18S875 by STS content mapping on the YAC contig map. 

A YAC 907-e-l clone was transformed with the 
(CAG) 7 or (CTG) 10 fragmentation vector. The (CAG) 7 
vector did not reveal the presence of any CAG repeat. 

10 Analysis of twenty-six (CTG) 10 fragmented YACs revealed 
that twenty-one of them had the same size of 
approximative^ 900kb. End Rescue was performed with 
Kpnl on three fragmented YACS of this size. Sequencing 
revealed the nucleotide sequence shown in italics in 

15 Figure 18a. A Blast search indicated the presence of 
an homology of this sequence with the GCT3G0I marker 
(GenBank accession number: G09484) . The sequence 
spanning the CTG repeat was obtained from the GenBank 
Database. The sequence was mapped between markers 10R 

20 and WI-528. 
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35 
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